Automated Essay Scoring Versus Human Scoring: A Correlational Study
نویسندگان
چکیده
The purpose of the current study was to analyze the relationship between automated essay scoring (AES) and human scoring in order to determine the validity and usefulness of AES for large-scale placement tests. Specifically, a correlational research design was used to examine the correlations between AES performance and human raters’ performance. Spearman rank correlation coefficient tests were utilized for data analyses. Results from the data analyses showed no statistically significant correlation between the overall holistic scores assigned by the AES tool and the overall holistic scores assigned by faculty human raters or human raters who scored another standardized writing test. On the other hand, there was a significant correlation between scores assigned by two teams of human raters. A significant correlation was also present between AES and faculty human scoring in Dimension 4 Sentence Structure, but no significant correlations existed in other dimensions. Findings from the current study do not corroborate previous findings on AES tools. Implications of these findings for English educators reveal that AES tools have limited capability at this point and that more reliable measures for assessment, like writing portfolios and conferencing, still need to be a part of the methods repertoire.
منابع مشابه
An Evaluation of IntelliMetricTM Essay Scoring System Using Responses to GMAT® AWA Prompts
The Graduate Management Admission Council® (GMAC®) has long benefited from advances in automated essay scoring. When GMAC® adopted ETS® e-rater® in 1999, the Council’s flagship product, the Graduate Management Admission Test® (GMAT®), became the first large-scale assessment to incorporate automated essay scoring. The change was controversial at the time (Iowa State Daily, 1999; Calfee, 2000). T...
متن کاملStumping e-rater: challenging the validity of automated essay scoring
This report presents the findings of a research project funded by and carried out under the auspices of the Graduate Record Examinations Board Researchers are encouraged to express freely their professional judgment. Therefore, points of view or opinions stated in Graduate Record Examinations Board Reports do not necessarily represent official Graduate Record Examinations Board position or poli...
متن کاملEvidence for the Interpretation and Use of Scores from an Automated Essay Scorer
This paper examined validity evidence for the scores based on the Intelligent Essay Assessor (IEA), an automated essay-scoring engine developed by Pearson Knowledge Technologies. A study was carried out using the validity framework described by Yang, et al. (2002). This framework delineates three approaches to validation studies: examine the relationship among scores given to the same essays by...
متن کاملInvestigating neural architectures for short answer scoring
Neural approaches to automated essay scoring have recently shown state-of-theart performance. The automated essay scoring task typically involves a broad notion of writing quality that encompasses content, grammar, organization, and conventions. This differs from the short answer content scoring task, which focuses on content accuracy. The inputs to neural essay scoring models – ngrams and embe...
متن کاملEnriching Automated Essay Scoring Using Discourse Marking
Electronic Essay Rater (e-rater) is a prototype automated essay scoring system built at Educational Testing Service (ETS) that uses discourse marking, in addition to syntactic information and topical content vector analyses to automatically assign essay scores. This paper gives a general description ore-rater as a whole, but its emphasis is on the importance of discourse marking and argument pa...
متن کامل